Using Clustering to Examine Inter-Individual Variability in Topography of Auditory Event-Related Potentials in Autism and Typical Development

Objective: Although prior studies have compared sensory ERP responses between groups of autistic and typically-developing participants, it is unclear how heterogeneity contributes to the results of these studies. The present study used examined individual differences in these responses. Method: 130 autistic children and 81 typically-developing children, aged between 2–5 years, listened to tones at four identity levels while 61-channel electroencephalography was recorded. Hierarchical clustering was used to group participants based on rescaled event-related potential (ERP) topographies between 51–350ms. Results: The hierarchical clustering analysis revealed substantial heterogeneity. Some of the seven clusters defined in this analysis were characterized by prolonged fronto-central positivities and/or weak or absent N2 negativities. However, many other participants fell into clusters in which N2 responses were present at varying latencies. Conclusions: Atypical response morphologies such as absent N2 responses and/or prolonged positive-going responses found in some autistic participants may account for prior research findings of attenuated N2 amplitudes in autism. However, there was also considerable overlap between groups, with participants of both groups appearing in all clusters. Significance: These results emphasize the utility of using clustering to explore individual differences in brain responses, which can expand on and clarify the results of analyses of group mean differences.


Introduction
Autism spectrum development (ASD) 1 is a heterogeneous neurominority. It is diagnosed behaviourally, and in contemporary guidelines, diagnosis is based on the presence of atypical social and communication behaviours, as well as at least some atypical non social behaviours, such as motor stereotypes ("stimming"), intensely focused interests, and/or atypical sensory reactivity (American Psychiatric Association, 2013). Although autism has traditionally been described using a spectrum metaphor, implying a single continuous dimension of greater to lesser severity, a more recent view describes autism as a constellation varying along multiple dimensions (Fletcher-Watson & Happé, 2019). The diversity within autism includes cognitive and behavioural heterogeneity along different dimensions, such as social-communication characteristics of autism (Georgiades et al., 2013); different dimensions of non-social autism characteristics such as intense interests and insistence on sameness (Grove, Begeer, Scheeren, Weiland, & Hoekstra, 2021); cognitive ability and different dimensions thereof such as verbal comprehension and fluid reasoning (Audras-Torrent et al., 2021); different forms of sensory sensitivity such as hyperacusis and misophonia (Williams, He, Cascio, & Woynaroski, 2020b); and the presence and severity of co-occurring conditions such as anxiety and depression (McCauley, Elias, & Lord, 2020); to name only a few. In general, autistic individuals differ from one another along at least as many distinct dimensions as typically-developing people. Moreover, there appears to be substantial neural heterogeneity in autism. For example, combining across research sites, functional connectivity appears to have only a relatively limited capacity to diagnostically separate autistic individuals from typically-developing controls (Lanka et al., 2019). Moreover, such binary diagnostic classification seems less demanding than attempting to discriminate between typical development (TD), autism, and other atypical neurotypes, such as attention-difference hyperactivity development (ADHD), 2 let alone combinations of co-occurring atypical neurotypes. In this context, it hardly seems surprising that some scholars in the social sciences emphasize socially constructed aspects of ASD, rather than the neurotype's biological coherence (Grinker, Yeargin-Allsopp, & Boyle, 2011).
This inter-individual variability may contain important information. For example, sensory processing in autism appears to be an area of profound real-world significance, being related not only to social cognition (Green, Hernandez, Bowman, Bookheimer, & Dapretto 2018), but also to quality of life (Lin & Huang, 2019) and anxiety (Green, Ben-Sasson, Soto, & Carter, 2012;Neil, Olsson, & Pellicano, 2016). Autistic sensory processing is also highly heterogeneous (Uljarević et al., 2017). It therefore seems eminently reasonable to expect that meaningful inter-individual variability exists in sensory-related brain responses in autism. For example, in a sample of 2-, 3-, and 4-year-old autistic and typically-developing children, Dwyer et al. (2020) examined neural heterogeneity by using clustering to group participants based on the relative strengths of topography-independent responses to sounds of different intensities. Substantial variability existed in both diagnostic groups, and notably, in ASD, relatively stronger responses to high-intensity sounds were related to caregiver-reported auditory distractibility/filtering problems. The present study extends the exploration of neural heterogeneity in the same sample studied by Dwyer et al (2020) with a focus on the spatiotemporal heterogeneity in electrocortical responses not captured in our initial examination of these data using Global Field Power (GFP).
Canonically speaking, young children in the 2-5 year age range of the present study exhibit two principal auditory ERP components over fronto-central channels, the P1 and N2 (Čeponienė, Lepistö, Alku, Aro, & Näätänen, 2003;Shafer, Yu & Wagner, 2015), as well as a large temporal negativity (Bruneau, Roux, Adrien, & Barthélémy, 1997;Shafer et al., 2015). This temporal negativity is variously referred to as Tb (Ponton, Eggermont, Khosla, Kwong, & Don, 2002), Na (Shafer et al., 2015), or N1 c (Bruneau et al., 1997); the present study refers to this response as the "Tb." Additional temporal responses have been described, but these are not consistently observed in the age range of the present study (Shafer et al., 2015). The fronto-central auditory N1 and P2 responses commonly associated with auditory ERPs are not typically apparent until later in development, around ages 9-14 (Albrecht, Suchodoletz, & Uwer, 2000;Gilley, Sharma, Dorman, & Martin, 2005;Ponton et al., 2002;Sharma, Kraus, McGee, & Nicol, 1997;cf. Wunderlich, Cone-Wesson, & Shepherd, 2006). There are also developmental changes in topography of auditory ERPs. While the P1 is observed fronto-centrally in young children such as those in the present study, the P1 and the later-emerging N1 shift to a more central topography by adulthood (Čeponienė, Rinne, & Näätänen, 2002). Source localization suggests the location of the equivalent cortical dipoles generating these responses are fairly similar, though delayed in children compared to adults, with origins in temporal auditory cortex (Albrecht et al., 2000, Parviainen, Helenius & Salmelin, 2019Ponton et al, 2002). Although developmental ERP studies sometimes report the proportion of individuals classified as displaying vs. not displaying a particular ERP component (e.g., Gilley et al., 2005;Shafer et al., 2015), more systematic examination of heterogeneity could uncover different subgroups of individuals with particular ERP morphologies and topographies. Observed scalp ERP waveforms are the consequence of the superposition of numerous synchronized cortical neurons (typically pyramidal cells) active as arrays of effective dipoles, which, as a function of dipole orientation and polarity related to folding of the cortical mantle, may cancel each other out (Luck, 2014). Individuals could differ in the relative strengths or contributions of these numerous active cortical effective dipoles to their observed ERP responses. Maturational changes, such as degree of myelination, could influence differences in the relative strength or importance of different regional cell populations contributing to observed scalp surface recordings. The orientation of dipoles could also vary, especially as brain structures continue to develop. In addition, prior research indicates that positions of EEG electrodes, relative to neuroanatomical features, vary across individuals (Homan, Herman, & Purdy, 1987). This would be expected to contribute additional variability in observed topographies beyond individual differences in patterns of cortical activation to a given stimulus.
Inter-individual variability due to such factors could complicate the interpretation of conventional ERP statistical analyses. For example, a finding that a particular ERP component has significantly lower amplitude in a particular group or condition could reflect any of a number of possibilities: a major dipole contributing to said component might be slightly weaker in all participants, said dipole might be substantially weaker in some participants but largely intact in others, another dipole with opposing polarity could be stronger, etc. The statistical analysis, however, would not distinguish between these possibilities. Descriptive techniques, such as using data-driven clustering to sort participants into subgroups that could illustrate different individual ERP patterns underlying grand-averaged patterns, could therefore provide additional information beyond that offered by conventional analyses.
The investigation of inter-individual variability might also help clarify inconsistencies in prior research regarding canonical ERP amplitudes in ASD. In some studies, no differences in amplitudes of auditory responses such as the P2 and N2 are observed between autistic and typically-developing groups (e.g., Andersson, Posserud, & Lundervold, 2013;Salmond, Vargha-Khadem, Gadian, de Haan, & Baldeweg, 2007). However, while Orekhova et al. (2008) found no evidence of group differences in P1 amplitudes to the first sound in pairs of clicks, Orekhova et al. (2009) found that amplitudes of other responses to these initial clicks were weaker in autistic participants: specifically, Tb amplitudes were lower over the right hemisphere and N2 amplitudes were weaker frontally. Other studies find that autistic participants have, relative to typically-developing participants, as well as non-autistic individuals with intellectual disabilities, bilaterally weaker Tb responses to pure tones across a range of sound intensities (Bruneau et al., 1999;Bruneau, Bonnet-Brilhault, Gomot, Adrien, & Barthélémy, 2003). Weaker N2 responses have also been observed in other studies (Donkers et al., 2020;Yu et al., 2018). Whitehouse and Bishop (2008) observed normal P1 and N2 amplitudes to nonspeech sounds (complex tones) in autism, but (at a trend level) diminished P1 and (significantly) diminished N2 amplitudes to speech sounds (vowels). While some of the inconsistency in these findings appears to reflect differences in experimental procedures and stimuli, it also seems likely that the heterogeneity of the autistic population could contribute to variability in results. There is also variability in findings related to ERP and event-related field latencies; although a number of studies have observed delayed auditory responses in ASD generally (Bruneau et al., 1999;Port et al., 2016;Roberts et al., 2010), specific subgroups might show particularly delayed responses .
If different latent populations within the autistic constellation exhibit different patterns of neural responses, different studies might obtain different results depending on which aspects of population variability they most heavily, but unknowingly, tap into, based on factors such as inclusion criteria and recruitment sources. With rare exceptions (e.g., DiStefano, Senturk, & Jeste, 2019;Dwyer et al., 2020), clustering analyses are not generally employed to investigate heterogeneity of electrophysiological responses in autism. This is unfortunate, as information from such analyses could potentially help researchers contextualize and better understand results at the level of group means.
The present study aims to use clustering to explore different patterns of auditory ERPs that may be seen within the autistic constellation, in addition to comparing autistic and typically developing groups in a search for mean differences. To provide sufficient participants for this clustering analysis, a large sample of young autistic and typically-developing children was recruited as part of the Autism Phenome Project (APP) at the UC Davis MIND Institute. ERPs in this sample were recorded in response to sounds of different intensities (50, 60, 70, and 80 dB). This dataset has previously been used to cluster children based on the relative strengths of topography-independent responses (Global Field Power, GFP) from an early time window approximately corresponding to that of the P1 response (Dwyer et al., 2020); the present study extends this work by not only including responses in sliding time intervals across an expanded time window, but also by including information about the topography of responses over different scalp regions.
It is important to note that the present study's use of clustering is not meant to imply that the subgroups defined here exist as discrete categorical entities. The question asked by algorithms aiming to determine an optimal number of clusters is ill-posed (Fushing & McAssey, 2010), at least when clusters are not convex and well-separated. When clusters are poorly separated and non-convex, this may imply that the structure of the data is in fact dimensional rather than categorical, in which case the idea of an optimal or true number of clusters becomes problematic. (Conversely, when clusters are clearly convex, this may be visually obvious, rendering clustering analyses potentially superfluous.) Thus, we view the use of clustering procedures as a descriptive data exploration technique that can complement conventional ERP analyses. We are not claiming that the particular clusters revealed here are inherent in any population of similarly distributed individuals. Rather, we aim to parse interindividual variability of electrocortical responses in this dataset into bounded subgroups that would ordinarily be lumped together. Thus, imposing categories on dimensional data may nonetheless yield valuable descriptive information, and different clustering solutions with varying numbers of categories at different hierarchical levels may offer valid descriptions of the same data for different purposes. We have also chosen to cluster autistic and typically-developing participants together, so that results in each diagnostic group can be placed in context through comparison to the other group.
Although the use of clustering in the present study should be regarded as exploratory, prior research reporting that Tb and N2 amplitudes are attenuated in ASD (reviewed by Williams, Abdelmessih, Key, & Woynaroski, 2020a) could suggest that atypical response morphologies in time windows and over scalp regions canonically associated with these ERP components will be observed in some subgroups within ASD, while other subgroups will show canonically-expected patterns. Indeed, prior research with the present dataset suggests that amplitudes over the spatiotemporal window of the N2 are indeed attenuated in ASD (Dwyer et al., 2021). As such, we predict:

1.
Some clusters containing a disproportionate number of autistic participants will be characterized by atypical topographies and weaker/less negative N2 responses, although other autistic participants will be placed with predominantly typically-developing participants in clusters with more typical topographies and robust N2 responses.

2.
Some clusters will show evidence of ERP responses with morphologies and topographies that differ from those canonically described in prior research with children in this age range, such that participants' responses cannot be easily described in terms of these canonical patterns.

Participants
As part of the APP, attempts were made to collect ERP data from 243 autistic and 96 typically-developing children, aged between 2-5. Autistic participants were required to meet criteria for a pervasive developmental disorder (based on DSM-IV and Collaborative Programs of Excellence in Autism Network criteria) and reach ADOS-G (Lord et al., 2000) autism spectrum cut-off scores as well as cut-offs for either the social or communication subscales of the ADI-R (Lord, Rutter, & Le Couteur, 1994). Further information about the APP and participant recruitment can be found in previous publications (e.g., Libero et al., 2016;Nordahl et al., 2011;De Meo-Monteil, 2019;Dwyer et al, 2020). A number of participants were excluded from the present study due to failure to collect data, due to noisy data, due to an insufficient number of acceptable-quality trials (<400), due to an excessive number of poor-quality channels (>6-7), or due to the presence of neuroanatomical abnormalities revealed by magnetic resonance imaging collected in the APP. One participant entered the study in the typically-developing group but was diagnosed with autism at a later APP time-point; this participant's data are also excluded. The final sample of children with usable electrophysiological data includes 81 typically-developing participants (52 male) and 130 autistic participants (110 male) (Table 1). Families received a gift card in return for their participation in the study. The study was approved by the UC Davis Institutional Review Board and informed consent was obtained from the parent/guardian of each participant.

Measures
Cognitive ability was measured with the Mullen Scales of Early Learning (MSEL; Mullen, 1995). In the APP, four MSEL subscales were administered: Visual Reception (VR), Fine Motor (FM), Expressive Language (EL), and Receptive Language (RL). A ratio developmental quotient (DQ) was calculated by dividing mental age by chronological age, then multiplying by 100. MSEL data are available from all 130 autistic participants with usable electrophysiological data, and for 80 of the 81 typically-developing participants.
Adaptive functioning was assessed with the parent-report form of the Vineland Adaptive Behavior Scales, Second Edition (VABS-II; Sparrow, Cichetti, & Balla, 2005). The standardized composite adaptive behaviour score was used to index individuals' overall adaptive functioning for analyses. VABS scores were available from 105 autistic (97 male) and 69 typically-developing participants (43 male).
The Short Sensory Profile (SSP; McIntosh, Miller, & Shyu, 1999) was collected as a measure of caregiver-reported sensory behaviours. Higher scores reflect more typical sensory behaviours, while lower scores reflect atypical sensory behaviours that may be problematic. Although the original seven SSP subscales were defined in a typically developing sample (McIntosh et al., 1999), two studies have investigated SSP factors in samples of autistic children (Tomchek, Huebner, & Dunn, 2014;Williams, Failla, Gotham, Woynaroski, & Cascio, 2018). The present study employs the more recent nine-factor solution developed by Williams et al. Specifically, the present study examines SSP total scores (calculated on the basis of all 38 items) and the three factor scores that appear to tap into the auditory modality, namely: Auditory Distractibility, Hyporesponsiveness to Speech, and Noise Distress. Complete SSP data were available from 98 autistic (81 male, M Age = 38.87 months) and 65 TD participants (42 male, M Age = 37.28 months). Partial data that included some or all of the factors of interest were available from another nine participants.

EEG Task
Participants were seated on a caregiver's lap in a dimly-lit, audiometrically-quiet, shielded chamber. Stimuli were 50ms (including 5ms rise and decay time) complex tones, each consisting of sine waves of equal amplitude overlaid at the following 7 frequencies (musical notes): 249 Hz (B3); 616 Hz (D5), 788 Hz (G5), 1042 Hz (C6), 1410 Hz (F6), 1952 Hz (B6), and 2749 Hz (F7). All tones were identical to one another in terms of the proportion of stimulus energy drawn from each frequency, and thus were identical in terms of frequency spectra. However, tones randomly varied in intensity (50 dB, 60 dB, 70 dB, and 80 dB SPL); tones of the same intensity were never presented twice in succession. Tones were presented at a randomly variable ISI of 1-2s using Sony MDR-222KD headphones calibrated with a B&K artificial ear (model 4153) and sound meter (model 2229). While they passively listened to these tones, participants watched a quiet video of their or their caregiver's choice. Approximately 1200 trials (~300 trials/condition) were collected from each participant, with breaks included when required. Further details regarding the experimental setup are available in De Meo-Monteil et al. (2019).

EEG Data Acquisition and Processing
EEG was collected with a 61-channel cap (www.easycap.de) and a Compumedics Neuroscan Synamp II amplifier. Data were sampled at a rate of 1000Hz with Cz as a reference. Data were then average-referenced and filtered offline with a low cut-off of 0.4 Hz (12dB/octave roll-off). Given the study's goal of exploring individual differences in ERP data, we sought to maximize the event-related signal-to-noise ratio by removing putatively non-neural signal sources from the data using an intensive data processing pipeline. Epochs (spanning −200ms to 900ms, including 300ms necessary for subsequent independent components analysis) were screened and extreme amplitudes removed using the artifact scan tool of BESA 5.2 (www.besa.de), amplitude thresholds were adjusted manually to optimize retention of usable data and rejection of extreme artefacts (e.g., temporary channel disconnection, gross movements). Mean amplitude thresholds were 316.76 μV (SD = 98.74) in ASD and 303.81 μV (SD = 95.09) in TD. Data were then manually inspected and clear artefacts (e.g., temporary channel disconnections) not removed by the amplitude threshold were rejected manually. On average, in the ASD group, 23% of trials were removed in this process, compared to 19% in the TD group (see also ) independent components analysis. A semi-automatic artifact removal tool (SMART, https://stanford.edu/~saggar/Software.html) was used to identify signal sources from SOBI that were manually interpreted, on the basis of outputs depicting signal source topography, spectra, autocorrelation, and time series, to be of non-neural origin (such as EMG, EOG, and blinks). This SOBI analysis was performed separately on the first and second halves of the data from each participant, consistent with recommendations (Luck, 2014). Additional details regarding artifact removal using SOBI and SMART are discussed in Saggar et al. (2012). Artifact-free trials were then reconstructed from the putatively neural SOBI signal sources and inspected to verify a lack of noise; putative noise reconstructions were also created to verify absence of neural signal therein. Finally, separate averages for each of the four intensity conditions were computed for each subject. Data from excluded channels were interpolated using a spherical spline (Perrin, Pernier, Bertrand, Giard, & Echallier, 1987). Epochs (now spanning 100ms pre-stimulus onset to 600ms post-stimulus onset) were filtered (second-order Butterworth with −12dB/octave roll/off; 40Hz low-pass; 60Hz notch) and baseline-corrected using the pre-stimulus period with Cartool software (Brunet et al., 2011).

EEG Data Analysis
In order to focus on inter-individual differences in neural responses, and given that the absolute strength of observed ERPs can be influenced by non-neural biophysical factors such as skull thickness (Frodl et al, 2001), we chose to rescale the data to emphasize differences in response strength between intensity conditions, as well as response topography. Therefore, separately at each time-point, each participant's data were rescaled such that the individual participant's highest ERP amplitude in any condition or channel at that time point became 1, while the lowest (most negative) amplitude became 0.
Seven regions of interest across the scalp were defined: a central region, left and right frontal regions, left and right temporal regions, and left and right posterior regions (Fig. 1). These regions were selected based on visual inspection of grand-average topographies: the central and frontal regions aimed to capture the P1 and N2 responses while the temporal regions aimed to capture the Tb response. Each participant's rescaled ERP responses were averaged across electrodes within the seven regions of interest in consecutive 25ms time-windows between 51 -350ms; these data were then submitted to the hierarchical clustering analysis. Data from electrodes lying on the edge of the cap, as well as certain channels lying between regions, were discarded.
In that analysis, we used Ward's method to hierarchically cluster participants based on the topography and strength of their auditory electrophysiological responses. Ward's method describes clusters in multivariate Euclidean space by successively adding clusters together in order to minimize their variance. This hierarchical process begins with clusters representing a single participant, but clusters grow as they are combined with further clusters until the entire dataset is contained within a single cluster. The process generates a dendrogram depicting the clusters that exist at different hierarchical levels.
The number of clusters were determined based on Euclidean distances between clusters (reflected in the height of the dendrogram branches), the interpretability and meaning of the clusters, and the stability of the results when subsamples of participants were repeatedly extracted and re-clustered (see Appendix B).
To further describe and interpret the ERPs found in the clusters, we also used traditional component-based analyses to statistically compare ERPs across clusters. These results are presented in Appendix C.
Exploratory Kruskal-Wallis and Wilcoxon-Mann-Whitney tests were used to compare clusters on chronological age, MSEL DQ, VABS composite scores, and SSP total, auditory distractibility, hyporesponsiveness to speech, and noise distress scores.

Results
For the purposes of the present analysis, we decided a seven-cluster solution appeared to offer the best description of the data ( Fig. 2; see also raw voltage heatmap in Supplementary  Fig. A.1). We designated these clusters Topographic Clusters 1-7 (TC1-TC7). The rescaled topography patterns for each cluster (depicted in separate sections of the figure, see caption for details), intensity (represented by rows), and consecutive 25 ms averaged time windows (represented by columns) are also displayed more accessibly in Fig. 3; these patterns will be described more completely later in the results section.

Cluster Membership by Diagnostic Group and Sex.
Proportions of participants in each diagnostic group did not significantly vary across clusters in the seven-group solution, X 2 (6, N = 211) = 9.98, p = .13 (Table 3). Given visually apparent trends for there to be relatively few typically-developing participants in clusters such as TC6 and TC7 (see the left column of Fig. 2; autistic participants are represented in gold and typically-developing participants in blue), this may reflect the fairly low statistical power of the chi-square statistic. There was no evidence suggesting cluster membership varied by sex in either autistic, X 2 (6, N = 130) = 0.97, p = .99, or typically-developing, X 2 (6, N = 81) = 7.85, p = .25 individuals (Table 4).

P1 Topographies.
In Fig. 3, the P1 response can be seen fronto-centrally in head plots from time window columns towards the left of each cluster (particularly the second to fourth from the left, around ~76 -150 ms; see also raw voltage topographies in Supplementary Fig. A.2).
Generally speaking, the topographies of the P1 response were similar across clusters, and, at least for the 76-100 ms (2 nd ) column, show evidence of intensity-dependence. The P1 response appeared markedly weaker in TC2 than in other clusters, especially over medial channels. In addition, in TC5, responses to 50 dB, 60 dB, and 70 dB sounds (the top three intensity rows in Fig. 3) appeared relatively weaker in comparison to the 80 dB response (from the bottom intensity row) than in other clusters.

Later Positivity Topographies
However, in some clusters, central and fronto-central positivities were apparent even after the canonical P1 time window. For example, in TC3, high rescaled amplitudes (per Fig. 3) are clearly apparent, particularly frontally, as late as ~251-275 ms in the 50 dB condition, with similar if lesser prolongations to higher-intensity sounds. In Fig. 4 (see also raw voltage waveforms in Supplementary Figs. A.3-A.6), which depicts rescaled waveforms (TC3 in black) over each of the seven measurement regions (region boundaries given in Fig. 1), these rescaled amplitudes appear to decline gradually from the P1 peak, although in some intensity conditions this gradual decline is interrupted by temporary plateaux approximately coinciding with the temporal Tb negativity.
A slightly different pattern is apparent in TC2, where late high rescaled amplitudes (per Fig.  3) appear to have a more central than frontal distribution. This central positivity appears not to reflect a simple prolongation of the P1 as in TC3, but a separate and distinct response (as can be seen predominantly at mid-central locations in the waveforms from Fig. 4, Supplementary Figs. A.3-A.6; TC2 waveforms are cyan). It also noteworthy that this central positivity in TC2 approximately coincides with the temporal Tb negativity.
T6 may exhibit the most unusual pattern. As in TC3, the P1 response in TC6 appears to continue frontocentrally -especially frontally -long after the P1 time window. However, where the frontal positivity in TC3 does eventually end, the frontal positivity in TC6 continues through to the end of the clustering time window, and it is therefore apparent in all columns of the TC6 section of Fig. 3 from ~51-350 ms. It is also noteworthy that these late frontal positivities appear less intensity dependent than the earlier P1 activation. This pattern is less prominent, but still visible, in raw voltage topographies from Supplementary Fig. A.2.
Finally, TC7 also shows late positive fronto-central voltages that are inconsistent with the canonical morphologies of cortical auditory components in young children. In the rightmost five columns of the TC7 section of Fig. 3, high rescaled amplitudes are apparent in some intensity rows. They are most prominent in the 60 dB condition (second row), where a clear positivity is apparent in raw voltages ( Supplementary Fig. A.2).

Tb Topographies.
Like the P1 component in its canonical window, the temporal Tb negativity generally appears fairly similar across clusters, but there are a few important between-cluster differences. Furthermore, the Tb response is most prominent in responses to higher-intensity stimuli, such as 70 dB or 80 dB sounds. In contrast to the P1, Tb responses to soft 50 dB sounds are difficult to discern in rescaled topographies (Fig. 3) or waveforms (Fig. 4).
Inspection of the temporal region panels (left and right subplots from the middle rows for each intensity) in Fig. 4 reveals that TC6 (green waveform) appears to show weak Tb responses across intensities, but based on statistical results in Appendix C, this may reflect contamination from the sustained frontocentral positivities found in TC6. Instead, the results in Appendix C suggest that raw Tb amplitudes are, in the higher intensity conditions where the Tb response is clearly discernable, particularly strong in TC1 and TC2 compared to other clusters.
Inspection of topographies from Fig. 3 also appears to suggest some substantial between cluster differences in Tb latencies. For example, in the 80 dB condition (bottom row), the Tb seems to be visible in TC2 from ~151-225 ms, and perhaps as late as ~226-250 ms. In contrast, in clusters such as TC1 and TC5, the Tb appears visible from 126-200 ms. The Tb is visible in TC4 only in the windows from ~126-175 ms. However, inspection of temporal waveforms (Fig. 4) does not appear to suggest substantial Tb latency differences between TC4 and other clusters, perhaps suggesting that the Tb responses in TC4 was subsumed into the early frontocentral N2 negativity that appears in that cluster.

N2 Topographies.
The N2 response was not consistently evoked in the present study. Although clear N2 responses in the 70 dB and 80 dB conditions (lower two rows of topography plots in Fig.  3) can be seen in some clusters from as early as ~176-200 ms through to ~326-350 ms, the N2 is seldom visible in the 60 dB condition and is never clearly discernable in the 50 dB condition.
Many of topographic differences between the clusters appeared to be related late fronto central voltages in or shortly prior to the latency range of the canonical N2 negativity. Some of these differences were stark. Particularly clear N2 responses can be seen fronto-centrally in clusters such as TC1 and TC4, and these patterns -highly consistent with canonical patterns -differ sharply from those in some other clusters. As noted previously, TC6 exhibited generally positive frontal voltages throughout the entire time window used in the clustering analysis (~51-350 ms). The only suggestion of any sort of N2 negativity in TC6 was a slight central dip in rescaled voltages in the 80 dB condition (bottom row of TC6 section in Fig. 3), which can be seen in time windows from ~201-350 ms (columns 7-12), but these still coincided with a frontal positivity. Thus, the canonical N2 response appeared absent in TC6.
TC7 offered another unusual case. In TC7, topographies around ~176-200 ms from the 80 dB condition (bottom row in Fig. 3) appear somewhat consistent with a frontal N2 topography. However, given the presence of strong temporal Tb negativities in time windows on either side of the ~176-200 ms period, it seems possible that this "N2" may actually have been part of the Tb response, spread across frontal channels by volume conduction. If so, then TC7 also appears to have lacked a clear N2 response. Fronto-central voltages in TC7 from later time windows seem to have been positive or neutral.
N2 responses varied a great deal across clusters even where they were present. One major between-cluster difference came in the form of latency. N2 responses in TC4 were apparent in topographies very early, around ~176-200 ms in the 60 dB, 70 dB, and 80 dB conditions (bottom three rows in the TC4 section of Fig. 3). Similarly, in TC5, the N2 response was clearly visible in Fig. 3 in the 70 dB condition (third row of TC5 section) from around ~176-200 ms. to ~276-300 ms. In contrast to TC5, the N2 appeared slightly later in TC1 (around ~226-250 ms, and was markedly increased in the 80 dB condition relative to lower intensities, (bottom row of the TC1 section of Fig. 3). The N2 did not appear in TC2 until around ~251-275 ms. In TC3, low/negative central voltages were apparent at that latency, and these had broadened to include frontal channels by ~301-325 ms (the penultimate column of Fig. 3). If the various frontocentral negativities between ~176 and ~350 ms can indeed all be categorized as "N2" responses, then these considerable delays in apparent N2 latencies seem to have permitted other responses to summate over the scalp instead. The central and frontal positivities appearing in TC2 and TC3, respectively -which were previously discussed in the section on topographies of the P1 and other positivitiesappeared in the long period before the delayed onset of the N2 in these two clusters.
Clusters also appeared to differ in regard to which sound intensities evoked N2 responses. In TC1, N2 responses were evoked by 70 dB sounds (row 3 of TC1 section of Fig. 3), but these were relatively much weaker than responses to 80 dB sounds (row 4); the N2 response to 60 dB sounds in TC1 was considerably fainter. In contrast, in TC2 and TC4, N2 responses to 60 dB, 70 dB, and 80 dB sounds (rows 2-4 of TC2 and TC4 sections of Fig. 3) appeared much more comparable in rescaled amplitude, although the responses to higher intensities still appeared larger and, at least in the case of TC4, more prolonged. The N2 response in TC5 was especially unusual in its intensity-dependency. Although there was little evidence of an N2 response to high-intensity 80 dB sounds in TC5 (row 4 of TC5 section of Fig. 3), a clear N2 response was elicited by softer 70 dB sounds (row 3).

Effect of Cluster Membership on Sensory, Cognitive, and Adaptive Function Measures.
Although there was no effect of SSP total score, auditory distractibility score, or hyporesponsiveness to speech score (see Table 5), SSP noise distress scores of typically developing participants significantly varied across clusters, Kruskal-Wallis H (6) = 14.60, p = .02 (Fig. 5d). Furthermore, the SSP noise distress scores of autistic participants varied across clusters, H (6) = 13.16, p = .04 (Fig. 5c). However, post-hoc effects did not approach significance in either diagnostic group after correction for multiple comparisons. In addition, apparent trends in either diagnostic group often appeared to reverse direction in the other diagnostic group, which makes interpretation of these data unclear.
There were no significant effects of MSEL DQ or VABS composite score in either diagnostic group or subgroup solution (Table 5).

Discussion
The present study used hierarchical clustering to describe subgroups of participants in terms of the rescaled scalp topographies of their ERP responses over a broad time window (51-350 ms). Interestingly, most of the variability between clusters observed in the clustering analysis appeared to relate to the topography of responses in the time window canonically associated with the fronto-central N2 response in this age range (i.e., after about ~200 ms; e.g., Shafer et al., 2015, use a 220-388 ms window). Previous research using the same dataset as the present study suggests that this response is attenuated in ASD (Dwyer et al., 2021). The heterogeneous patterns described by the clustering analysis appear to have considerable relevance for the interpretation of this group mean difference. Instead of reflecting a simple reduction in the strength of the N2 negativity in ASD, the mean level difference between diagnostic groups might at least partly reflect the influence of positive-going responses simultaneously summating over the scalp. The dipole(s) generating these positivities might be relatively stronger in a subset of autistic participants, or the weakness of negative-going responses in these participants might reveal positivities that would otherwise have been masked, or some combination of both factors may be involved.
The present study's focus on the canonical N2 time window appears to highlight different dimensions of neural variability than those reported on previously by Dwyer et al. (2020), who focused on the topography-independent strengths of Global Field Power responses in the P1 time window across different intensity conditions.

Responses in Canonical Tb Spatiotemporal Window.
Some differences in Tb response patterns were observed across clusters. For example, the Tb response appeared strong in TC1 and prolonged in TC2. However, between cluster differences in Tb responses were relatively modest, with all clusters exhibiting, in waveforms, clear Tb responses to at least high-intensity sounds.

Responses in Canonical N2 Spatiotemporal Window.
However, response patterns in the spatiotemporal window canonically associated with the N2 response differed starkly across clusters. Notably, an N2 response is clearly evoked by 60 dB sounds in TC2, TC4, and TC5, as can be seen in scalp topographies from Fig. 3. However, in TC1, the fronto-central N2 was relatively weak in the 60 dB condition and increased markedly in strength with higher-intensity 70 dB and 80 dB stimuli. Surprisingly, in TC5, there was a clear N2 response to 70 dB stimuli, but there was little evidence of an N2 response to high-intensity 80 dB sounds.
Perhaps even more tellingly, some subgroups exhibit positive-going responses in the time window canonically associated with N2; most strikingly, TC6 does not appear to display a frontocentral N2 response to 70 dB and 80 dB sounds, except perhaps for a slight dip in central amplitudes in response to 80 dB sounds . Instead, in all conditions, TC6 appears to display a sustained frontocentral or frontal positive-going response that begins with the P1 and continues, albeit with reduced amplitudes, until the end of the time window used in the clustering analysis (350 ms). For example, while an early and brief N2 response is visible in TC7 in at least some conditions, this is followed by a positive-going deflection as well as, in the 60 dB condition, positive raw amplitudes (see Supplementary  Figs. A.2, A.4). Similarly, in TC2, the initial P1 positivity is in all conditions followed, after a negative deflection, by a renewed positivity, particularly over central sites. Although the N2 sometimes (in higher intensity conditions) follows this second positivity in TC2, the positivity does fall partly within the time window used for ANOVA analyses of the canonical N2 at the mean level. The existence of these unusual positive-going responses appears to support our second hypothesis (i.e., that some clusters would exhibit responses with morphologies and topographies not consistent with canonical components).
The presence of these positive-going responses emphasizes the importance of superposition in event-related potential studies. As noted earlier, ERPs are what remains after potentially numerous dipoles from different neural generators summate across the scalp. It seems quite likely that some of the different observed ERP morphologies seen across different subgroups from this analysis might reflect changes in the relative strengths of a number of differently oriented dipoles (associated with either positive or negative voltages at particular scalp sites); such differences in strength could then result in either positive or negative voltage amplitudes over different regions of the scalp.
Moreover, a large number of autistic participants were observed in some of these clusters. For example, there appeared to be many typically-developing participants in TC1 and TC4, and many autistic participants in TC6 and TC7. Although we did not find significant differences in proportions of participants across clusters, this may have reflected the limited power of the chi-square statistic. This makes it difficult to draw clear conclusions regarding the first hypothesis of the present study, which suggested that autistic participants would be more likely to fall into clusters with weaker or absent N2 responses, although some other autistic participants would display responses comparable to typically-developing participants.
As noted earlier, both TC6 and TC7 exhibited ERP response patterns differing from the canonically-expected pattern of negative amplitudes in the N2 time window. A large number of autistic participants also appeared in TC2, with its second central positivity following the P1.
That being said, both typically-developing and autistic participants were present in all clusters, and the lack of a significant chi-square effect highlights the large extent of overlap between diagnostic groups in the distributions of their electrophysiological event-related responses, consistent with the findings in Dwyer et al. (2020) in the same sample as the present study.

Effects of Cluster Membership on Age and Measured Variables
Interestingly, exploratory analyses comparing clusters on other variables found that autistic participants in TC6 appeared to be unusually young. However, the age effect observed in autism is ambiguous due to the cross-sectional nature of the present study: it is possible, for example, that the younger autistic participants in TC6 might have been able to participate in this study due to being diagnosed earlier than other participants. An early diagnosis might be due to differences in any aspects of behavioural phenotypes or family background not explored in this study, as age of diagnosis in autism is systematically related to other variables (Daniels & Mandell, 2014;Zwaigenbaum et al., 2019).
Exploratory analyses also suggested that the SSP noise distress scores of both autistic and typically-developing participants differed across clusters. However, when the patterns of these significant effects are examined (see Figs. 5C-5D), it is apparent that some clusters characterized by low SSP scores in TD are characterized by high scores in ASD, and vice versa. While one could develop "just-so" stories to explain this apparent interaction, a more prosaic interpretation is simply that the observed effects of SSP noise distress scores reflect Type I error due to the large number of exploratory comparisons run in the present study, and that they should not be further interpreted.

Limitations
The present study has a number of strengths, including its large and well-characterized sample, the large number of trials collected from each participant, and the use of an intensive data processing pipeline including SOBI independent components analysis to remove putative sources of noise from the final data. However, it is not without limitations.
For example, some age effects were obtained in the present study, and given the cross sectional nature of the study it is difficult to decisively determine whether these reflect true developmental change or another variable confounded with age. To address this limitation, the authors are currently collecting data in the same experimental paradigm from a new longitudinal sample through the Autism Centers of Excellence-funded Brain Research in Autism Investigating Neurophenotypes (BRAIN) study.
In addition, the present study did not involve collection of hearing acuity measures, due to the difficulty of obtaining reliable estimates of hearing acuity in young children. However, prior research suggests that there is substantial variability in hearing acuity within ASD (Demopolous & Lewine, 2016;Rosenhall et al., 1999). It is unclear how hearing acuity might relate to the individual differences observed in this study.
It should also be noted that the present study's approach to investigating variability in topography of auditory responses in ASD is only one approach. There are other approaches to topographic analysis, such as TANOVA (see Murray, Brunet, & Michel, 2008); researchers might also choose to attempt to reconstruct sources underlying scalp topographies. Moreover, while the present study investigated intensity-dependency of responses, it does not manipulate dimensions such as frequency or background noise.
Furthermore, the present study only evaluated the stability and replicability of its clustering solution through repeatedly re-sampling and re-clustering subsets of participants (see Appendix B). This approach is limited in the sense that it cannot determine whether the clustering results obtained in the present sample will replicate in other samples. However, the BRAIN study could allow for the replicability of the present study's results to be investigated.

Summary
The present study used hierarchical clustering to describe considerable inter-individual variability in the topographies of ERP responses to auditory stimuli in ASD and TD. These results complement and contextualize prior findings of reduced N2 response amplitudes in ASD (see Williams et al., 2020a), including such findings from the dataset used in the present study (Dwyer et al., 2021); on the basis of our results, these findings may to be driven in large part by subgroups with largely absent N2 responses and sometimes, indeed, apparent positive-going responses in the time window associated with the N2. Furthermore, many other autistic participants display responses that cannot be easily distinguished from those of most typically-developing participants, while some typically-developing participants exist in clusters displaying atypical event-related responses. Indeed, the extent of the overlap between diagnostic groups was so considerable that no significant differences in proportions of participants from each group across clusters were observed, despite some visually-apparent trends. This overlap between diagnostic groups, combined with the diverse patterns of neural responses observed across different clusters, emphasizes the existence and importance of inter-individual variability within neurotypes such as ASD and TD. We suggest that clustering analyses similar to those included in the present study may be productively used by other authors to better characterize and understand heterogeneity in their own functional neuroimaging studies of ASD and other neurotypes, thereby placing findings of group mean differences into better context.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Highlights
• Large individual differences in loudness-dependent auditory ERP topographies exist in both autism and typical development • These individual differences appear particularly clearly in the spatiotemporal window of the N2 response • These individual differences underlie and can clarify and expand on patterns of group mean differences described in prior research  ERP amplitudes, rescaled within participants and time-points to range between 0 and 1 across conditions and channels, as clustered using Ward's method.  ERP amplitudes, rescaled within participants and time-points to range between 0 and 1 across conditions and channels, averaged across each cluster and intensity, collapsed across diagnostic groups, spherically splined from 61 channels and plotted on a head model, in consecutive 25 ms time windows from 51-350 ms. All clusters appear to display frontocentral P1 responses around ~76-175 ms, though with lower amplitude in TC2. Interestingly, in TC6, the frontal P1 response appears extremely prolonged, with high rescaled voltages being visible across all time windows and intensities from ~76-350 ms. TC7 also displays late high frontocentral rescaled amplitudes, but these only become visible after ~225 ms and are less visible in the 80 dB condition. In TC2, the P1 appears to be followed by a second period of high rescaled amplitudes over central sites around ~176-250 ms (i.e., a central positivity, as demonstrated by raw amplitudes topographies in Fig. 4); a prolongation of the P1 can be observed in TC3 up to as late as ~251-275 ms. Frontocentral  N2 responses are visible in TC1, TC2, TC3, TC4, and TC5, and to some extent TC3, but onset latencies vary substantially from around ~176-200 ms (TC4) to ~301-325 ms (TC3). Some suggestion of an N2 is visible in TC7 (~176-200 ms), though this might be better interpreted as a Tb response. The N2 is generally evoked mainly in response to high-intensity 70 dB and 80 dB sounds, but clear N2 responses to soft 60 dB sounds are visible in TC2, TC4, and TC5. Strangely, there is little evidence of an N2 response to high-intensity 80 dB sounds in TC5.

Fig. 4.
ERP amplitude waveforms over electrodes in each of the seven regions depicted in Fig.  1, rescaled within participants and time-points to range between 0 and 1 across conditions and channels, averaged across each cluster from the seven-cluster solution, collapsed across diagnostic groups. The Y-axis (vertical line on each subplot) ranges from 0.15 to 0.85, with tick marks for every 0.10 rescaled amplitude units. The X-axis ranges from 0 to 350 ms, with tick marks each 50 ms. A (top left). Rescaled amplitude waveforms evoked by 50 dB stimuli. All clusters appear to display frontocentral P1 responses. Interestingly, TC6 displays continuous high rescaled amplitudes (suggestive of positive raw voltages) frontally, perhaps suggestive of a very prolonged P1 response. In TC2, the P1 is followed by a second period of high rescaled amplitudes (suggestive of a positivity) over the central region, and to a lesser extent frontally. The central P1 appears somewhat prolonged in TC1 and TC3. It is difficult to discern any clear N2 response in this condition, but frontocentral rescaled amplitudes appear particularly low in TC4.
B (top right). Rescaled amplitude waveforms evoked by 60 dB stimuli. In TC7, a period of low rescaled frontal amplitudes (i.e., a negativity, perhaps an early N2) following the P1 terminates early and appears to be followed by a frontocentral positivity. As in the 50 dB condition, a prolonged frontal positivity is visible in TC6. A second positivity following P1 appears evident in TC2, while the P1 appears prolonged in TC1 and TC3. The temporal Tb response appears to be visible in some clusters, especially TC1, TC4, TC6, and TC7. In comparison to the 50 dB condition, the N2 response appears more distinct in some clusters. Frontocentral N2 responses seem evident in TC4 and TC5, and a later N2 is visible frontally in TC2; rescaled amplitudes also appear somewhat low frontally in TC1 and centrally in TC3. C (bottom left). Rescaled amplitude waveforms evoked by 70 dB stimuli. A prolonged fronto-central positivity appears to be visible in TC6. A second positivity following P1 appears evident in TC2, while the P1 seems prolonged in TC1 and TC3. Particularly robust and sustained frontocentral N2 responses are evident in TC1, TC4, and TC5, while the N2 response appears somewhat later in TC2 and TC3. As in the 60 dB condition, there is an early frontal negativity in TC7, followed by little further activity. D (bottom right). Rescaled amplitude waveforms evoked by 80 dB stimuli. Patterns are generally similar to those from the 70 dB condition, with some exceptions. The prolonged positivity in TC6 is visible frontally but, compared to the 70 dB condition, is less evident over the central region. In TC5, the N2 appears primarily over the central region; in contrast to the 70 dB condition, there is little evidence of an N2 response to 80 dB stimuli over frontal regions in TC5.   Mean and standard deviation of number of trials retained in final averages after all data processing was completed by group and condition.  Counts and percentages of autistic and typically-developing participants, separately, in cluster groups from the seven-cluster solution. Counts and percentages of autistic and typically-developing participants, separately, in cluster groups from the seven-cluster solution. TC1  TC2  TC3  TC4  TC5  TC6